Foreground and background text in retrieval

نویسندگان

  • Jussi Karlgren
  • Timo Järvinen
چکیده

Our hypothesis is that certain clauses have foreground functions in text, while other clauses have background functions and that these functions are expressed or reflected in the syntactic structure of the clause. Presumably these clauses will have differing utility for automatic approaches to text understanding; a summarization system might want to utilize background clauses to capture commonalities between numbers of documents while an indexing system might use foreground clauses in order to capture specific characteristics of a certain document. Topic in text for information access This paper gives a short description of a series of experiments we have performed to test our hypotheses that clauses have different functions in transmitting the information flow of text, namely the functions often called topicality or thematic structure. The application area we chose to evaluate our hypotheses through is that of analysis of texts for the purposes of information retrieval. Topicality, foreground, and background There is an entire body of research put into uncovering the topical structure of clauses and texts. There is a long tradition of semantic and pragmatic study of clause structure from the Charles University in Prague (e.g. Hajičová, 1993), there are several results supporting our hypotheses using the general theory of transitivity (Halliday, 1967, 1978; Hopper, 1979), there are numbers of algorithms for anaphor resolution which touch clausal categorization, there are studies of automatic summarization algorithms, and there are studies of text grammars which all have bearing on our work. However, no studies have been made specifically on clausal categorization for topical analysis, and the empirical validation of these ideas have been held back for lack of effective tools. Transitivity and clauses Transitivity is one of the most basic notions in the system of language, but ill formalized in the formal study of language. Copyright c 2002, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Clauses in language represent events and processes of various kinds, and transitivity is that characteristic of a clause which models the character of the process or event it represents. This systemic model was first formulated by Halliday (1967) and has since been elaborated by Hopper and others in a theoretic sense: very little empirical study on large numbers of texts has been performed, and no systematic let alone quantitative evaluation of the theories has even been proposed. One of the basic conceptual structures of language in use is that actions are done by people and affect things. How the action is performed, by whom, and on what are all encoded in the clause by various syntactic mechanisms, in a general system of transitivity. For most non-linguists, transitivity is only explicitly mentioned in foreign-language classes when classifying verbs as transitive or intransitive, meaning if the verb in question takes a direct object or prefers not to. This is of course central to the task of modeling action and effect, but transitivity covers more than this one aspect of process structure. Halliday’s model mentions a number of specific factors or “systems” that cover the more general “system” of transitivity: Number, type, and role of participant: human or not? Agent? Benefactive?; Process type: existence, possession, spatial/locative, spatial/mobile (e.g. 1978, p. 118). These aspects of clausal organization hook up with factors such as temporal, aspectual, or mood systems to produce a clause. This clause not only carries information about the event or process it represents, but it also crucially builds a text, together with adjacent clauses. In Halliday’s model (most comprehensively delineated in his 1967 publication) a clause is the confluence of three systems of syntactic choice: transitivity, mood and theme. Transitivity, he writes, is the set of options relating to cognitive content, mood being the system for organizing the utterance into a speech situation, and theme being the system for organizing the utterance into a discourse. While there is ample psycholinguistic evidence that the syntactic form of a clause is discarded after being processed by the hearer or reader (e.g. Jarvella, 1979), the communicative structure of the clause is retained to organize the information content of the text or discourse. The structure of a clause is not arbitrary, and cannot be determined in isolation from other clauses in the vicinity and other events, processes, and participants represented and mentioned in the Table 1: Transitivity characteristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancement of Learning Based Image Matting Method with Different Background/Foreground Weights

The problem of accurate foreground estimation in images is called Image Matting. In image matting methods, a map is used as learning data, which is produced by those pixels that are definitely foreground, definitely background ,and unknown. This three-level pixel map is often referred to as a trimap, which is produced manually in alpha matte datasets. The true class of unknown pixels will be es...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Content Based Image Retrieval in Presence of Foreground Disturbances

In this paper we analyze the image retrieval problem in presence of possible foreground disturbances. The foreground may be irrelevant for the retrieval but it occludes the background and hence reduces the retrieval accuracy. We propose the use of a video as a query so that the moving foreground can be extracted. The segmented foreground region is subsequently filled into increase the retrieval...

متن کامل

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

تأثیر اسرارنامه عطار بر گلشن راز شبستری بر اساس نظریه بینامتنیت

Intertextuality is the overt or covert appearance of a contemporary or earlier text in another, and texts with similar themes can more often impact each other intertextually. From this perspective, the intertextual impact of ‘Attār’s Asrār Nāmeh, as the background text, on Shabestari's Golshan Rāz, as the foreground text, was studied. It is argued that this intertextual impact ...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002